Statistical dialect classification based on mean phonetic features
نویسندگان
چکیده
Our paper describes work done on a text-dependent method for automatic utterance classi cation and dialect model selection using mean cepstral and duration features on a per phoneme basis. From transcribed dialect data, we build a linear discriminant to separate the dialects in feature space. This method is potentially much faster than our previous selection algorithm. We have been able to achieve error rates of 8% for distinguishing Northern US speakers from Southern US speakers, and average error rates of 13% on a variety of ner pairwise dialect discriminations. We also present a description of the training and test corpora collected for this work.
منابع مشابه
Discriminative n-gram selection for dialect recognition
Dialect recognition is a challenging and multifaceted problem. Distinguishing between dialects can rely upon many tiers of interpretation of speech data—e.g., prosodic, phonetic, spectral, and word. High-accuracy automatic methods for dialect recognition typically use either phonetic or spectral characteristics of the input. A challenge with spectral system, such as those based on shifted-delta...
متن کاملChinese dialect identification using an acoustic-phonotactic model
In this paper we develop hidden Markov model (HMM) based approaches to identify Chinese dialects spoken in Taiwan. This task can be aided by exploiting various characteristic features of Chinese spoken languages. The baseline system performs phonotactic analysis after the speech utterance is tokenized into a sequence of five broad phonetic classes. The sequential statistics of the resulting sym...
متن کاملSome Acoustic Cues for Categorizing American English Regional Dialects: An Initial Report on Dialect Variation in Production and Perception1
Phonological differences between regional dialects of American English are well established in the sociolinguistics literature. The perception of these phonological differences by naïve listeners is much less well understood, however. Using an existing corpus of spoken sentences produced by talkers from a number of distinct regional dialects in the United States, an acoustic analysis was conduc...
متن کاملDialect Distance Assessment Based on 2-dimensional Pitch Slope Features and Kullback Leibler Divergences
Dialect variations of a language have a severe impact on the performance of speech systems. Therefore, knowing how close or separate dialects are in a given language space provides useful information to predict, or improve, system performance when there is a mismatch between train and test data. Distance measures have been used in several applications of speech processing, including speech reco...
متن کاملClassifying and Clustering Dialects of North American English
This paper presents the results of experiments in which machine learning techniques were applied to the problem of determining regional dialect boundaries. Specifically, decision trees classification and k-means clustering were applied to a corpus of phonetic measurements taken from a large survey of North American English vowels. Pairwise classification and clustering experiments were done for...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996